Skip to content

kohya-ss lora support #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 22, 2025

Conversation

arledesma
Copy link
Member

@arledesma arledesma commented Jun 28, 2025

Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi

https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277

https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57

https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470


Features

  • Add kohya-ss lora LoRAReady implementation - supporting additional LoRA formats
    • dropdown in settings supports both lora loaders
      • diffusers
      • lora_ready
  • Model (merged transfomer) reuse optimization - reducing time to generate
    • checkbox in settings, defaulted to off
    • model reuse can occur when the following occurs in succession:
      • exact same model
      • exact same lora(s) and associated weights
      • the second or subsequent request

UI

  • Settings - Experimental Settings collapsed accordion
    • Dropdown for lora loader
      • diffusers
      • lora_ready
    • Checkbox for model reuse
      • Unchecked by default

Additional

  • Add fp8 support to convert LoRA
    • not wired up - will require supporting fp8 model weights, ui and plumbing

LoRA's like https://civitai.com/models/1518315/transporter-effect-from-star-trek-the-next-generation-or-hunyuan-video-lora are functioning with this implementation, while failing with the existing (diffusers) implementation.

Under the covers

LoRA keys are named with a prefix of lora_unet_.
The lora name replaces any . with a _.
The keys then have a format of lora_unet_{lora_name}.

The lora loader iterates through the lora dictionary keys to rename keys to a consistent naming.

  1. When keys begin with diffusion_model or transfomer AND end with lora_A then it is converted with convert_from_diffusion_pipe_or_something
  2. When keys include double_blocks or single_blocks then it is identified as hunyan and converted with convert_hunyan_to_framepack

convert_from_diffusion_pipe_or_something renames diffusion_model keys from _lora_A_ to .lora_down., and from _lora_B_ to .lora_up.
convert_from_diffusion_pipe_or_something assigns the first dimension of the lora_down to the lora weight
convert_from_diffusion_pipe_or_something adds the alpha with rank to the weight
convert_hunyan_to_framepack renames and splits keys in double_blocks and single_blocks
convert_hunyan_to_framepack splits up and slices QKV keys (or QKVM keys) into individual Q,K,V (or Q,K,V,M keys)


fp8 is out of scope - has been set to raise an exception when used to alert future development

@arledesma arledesma force-pushed the feature/kohya-ss-lora-support branch 4 times, most recently from 3570f32 to a8cd34e Compare July 1, 2025 02:40
@arledesma
Copy link
Member Author

It looked like I had pushed up half broken code. Should now be in a working state.
Multiple LoRA's seem to behave fairly well together.


I'm not sure why the global current_generator was being referenced by importing from __main__, but it works just fine for me with a normal global reference.
In order for me to understand the values that were being passed around I also needed to add some additional typing, which could really help to clean up much of the codebase if you chose to go that way. Many of the critical paths now have typing so you could springboard from these changes to remove string references to model_type, or feel free to just take the bits that you want. There are also quite a few unused variables that made it a little difficult to troubleshoot (and a few uninitialized variables that are possibly causing bugs) that should probably be reviewed for removal or proper use.

@arledesma
Copy link
Member Author

Work was started to enable reusing the existing transformer, which would have shaved off ~30 seconds per generation on my local if there were no changes to base model or lora weights, but it would have required the additional work that I do not currently have the time to put in.

@arledesma arledesma mentioned this pull request Jul 1, 2025
@arledesma arledesma force-pushed the feature/kohya-ss-lora-support branch 3 times, most recently from 4b4c2e1 to b130f16 Compare July 8, 2025 04:02
@arledesma
Copy link
Member Author

@colinurbs I went ahead and hacked in a manager to enable reuse of the existing transformer when there are no changes to the model or any weights. Without this additional change there is around 30-45 seconds of load time for the LoRA's while using kohya_ss's implementation.

It's pretty nice so far.

2025-07-08T04-12-33-chrome_HphO5iKOjY

These were generated at 256x256 for testing, but the lora's seem to be performing some heavy lifting.

250707_231212_812_2865_9.mp4
250707_231155_707_4831_9.mp4

@colinurbs
Copy link
Member

This is fantastic work, thank you so much. I see you've left it as a draft. Is there any reason I shouldn't merge this into develop and start testing it?

@arledesma
Copy link
Member Author

@colinurbs no reason from my side to not merge into develop, I'll remove the draft status.
The PR is also marked to allow maintainers of this repo to directly edit it, so the team could effectively work the pr branch directly without merging if you find that it is too risky to merge.

image

@arledesma arledesma marked this pull request as ready for review July 9, 2025 16:12
@arledesma arledesma changed the title kohya ss lora support kohya-ss lora support Jul 9, 2025
@arledesma arledesma force-pushed the feature/kohya-ss-lora-support branch from 590cb22 to 1e6a32c Compare July 9, 2025 17:42

if current_generator is not None and current_generator.transformer is not None:
offload_model_from_device_for_memory_preservation(
current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change below.)

offload_model_from_device_for_memory_preservation(studio_module.current_generator.transformer, target_device=gpu, preserved_memory_gb=8)
current_generator.move_lora_adapters_to_device(cpu)
offload_model_from_device_for_memory_preservation(
current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change above.)


studio_manager.current_generator = current_generator = new_generator
# Load the transformer model
current_generator.load_model()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear why we load here without moving the transformer to the gpu, but on 296 a pre-existing transformer does go to the gpu.

f"Worker: AFTER model assignment, current_generator is {type(current_generator)}, id: {id(current_generator)}")
if current_generator:
print(
f"Worker: current_generator.transformer is {type(current_generator.transformer)}. load_model() will be called next.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is already loaded above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just some print debugging. I'll remove the noise.

@Xipomus
Copy link
Member

Xipomus commented Jul 17, 2025

Keeps download model files even when it already found them...:
Loading diffusion_pytorch_model-00001-of-00003.safetensors:odel doesn't have any LoRA adapters or peft_config.
[After unloading LoRAs] Transformer has no peft_config attribute
[After unloading LoRAs] No LoRA components found in transformer
Loading LoRAs: ['D4nceCLub_e80_1561500', 'cyberpunk_1_epoch_1359380'] with values: [1, 1]
Previous LoRA config: None, Current LoRA config: ModelConfiguration(model_name='Original', settings=ModelSettings(lora_settings=[ModelLoraSetting(name='D4nceCLub_e80_1561500', weight=1.0, sequence=0, exclude_blocks=None, include_blocks=None), ModelLoraSetting(name='cyberpunk_1_epoch_1359380', weight=1.0, sequence=1, exclude_blocks=None, include_blocks=None)]))
Previous LoRA hash: , Current LoRA hash: f7eddeb183957386a46900faf001c42c
Loading LoRAs using kohya_ss loader from E:\Stable Diffusion-AI images\webui\models\Lora\Hunyuan
LoRA -> Found model files: ['E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00001-of-00003.safetensors', 'E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00002-of-00003.safetensors', 'E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00003-of-00003.safetensors']
LoRA loading: D4nceCLub_e80_1561500.safetensors (scale: 1)
LoRA loading: cyberpunk_1_epoch_1359380.safetensors (scale: 1)
Model architecture: HunyuanVideo
Diffusion-pipe (?) LoRA detected
HunyuanVideo LoRA detected, converting to FramePack format
Diffusion-pipe (?) LoRA detected
HunyuanVideo LoRA detected, converting to FramePack format
Merging LoRA weights into state dict. multiplier: [1, 1]
Loading diffusion_pytorch_model-00002-of-00003.safetensors: 59%|███████████▊ | 284/482 [00:30<00:18, 10.53it/s]

@colinurbs
Copy link
Member

The first commit of this has been merged into develop and seems to be working well so far.

@arledesma
Copy link
Member Author

arledesma commented Jul 18, 2025

Keeps download model files even when it already found them...:

Loading diffusion_pytorch_model-00001-of-00003.safetensors: ...
...
Loading diffusion_pytorch_model-00002-of-00003.safetensors: 59%|███████████▊ | 284/482 [00:30<00:18, 10.53it/s]

This is loading the model from disk, not downloading. The progress bar is a function of tqdm.

@arledesma
Copy link
Member Author

arledesma commented Jul 18, 2025

@colinurbs you were just a bit too quick for me :D

I was working on minimizing the changes and exposing the settings in https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready which is from a couple of hours ago, around the time that you merged the commit to develop.

I have https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ merged with the current develop.

Providing the choice for the loader and model reuse in the settings page seemed like an easier A/B test.

image

Regardless, I'm good with anything that you want to do over here. 👍🏽

@arledesma
Copy link
Member Author

 https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ is currently unloading the loras when reusing the model transformer, so only the first generation will apply lora weights.

Definitely not in a desirable state.

@colinurbs
Copy link
Member

@arledesma thanks for this. I'm going to roll back develop and merge this whole thing in this evening. I haven't had time to dig into the code, did you add an exception for already downloaded models to prevent it from forcing existing users to re-download?

@arledesma
Copy link
Member Author

@colinurbs I have yet to replicate any models being re-downloaded. I've attempted multiple times with different settings and even reinstalled the entire repository without experiencing the issue.

Could it just be what is mentioned in #295 (comment) where it is loading from disk being misinterpreted as downloading? If so then maybe we can maybe update the wording in lora_utils.load_safetensors_with_fp8_optimization() with something other than f"Loading {os.path.basename(model_file)}"?

def load_safetensors_with_fp8_optimization(
    model_files: list[str],
    fp8_optimization: bool,
    device: torch.device,
    weight_hook: Callable | None = None,
) -> dict[str, torch.Tensor]:
    """
    Load state dict from safetensors files and merge LoRA weights into the state dict with fp8 optimization if needed.
    """
    state_dict = {}
    if fp8_optimization:
        raise RuntimeWarning("FP8 optimization is not yet supported in this version.")
        from .fp8_optimization_utils import (
            optimize_state_dict_with_fp8_on_the_fly,
        )

        # Optimization targets and exclusion keys
        TARGET_KEYS = ["transformer_blocks", "single_transformer_blocks"]
        EXCLUDE_KEYS = [
            "norm"
        ]  # Exclude norm layers (e.g., LayerNorm, RMSNorm) from FP8

        print(f"FP8: Optimizing state dictionary on the fly")
        # Optimized state dictionary in FP8 format
        state_dict = optimize_state_dict_with_fp8_on_the_fly(
            model_files,
            device,
            TARGET_KEYS,
            EXCLUDE_KEYS,
            move_to_device=False,
            weight_hook=weight_hook,
        )
    else:
        from .safetensors_utils import MemoryEfficientSafeOpen

        state_dict = {}
        for model_file in model_files:
            with MemoryEfficientSafeOpen(model_file) as f:
                for key in tqdm(
                    f.keys(),
                    desc=f"Loading {os.path.basename(model_file)}",
                    leave=False,
                ):
                    value = f.get_tensor(key)
                    if weight_hook is not None:
                        value = weight_hook(key, value)
                    state_dict[key] = value

    return state_dict

@colinurbs
Copy link
Member

Ok, I'll try this again tonight and see if it happens for me. I believe it's due to the slightly different structure being used in the hf cache folder. But I'll let you know.

@RT-Borg
Copy link
Member

RT-Borg commented Jul 18, 2025

@colinurbs I have yet to replicate any models being re-downloaded. I've attempted multiple times with different settings and even reinstalled the entire repository without experiencing the issue.

@arledesma see my long message on discord #testers last night (that I @'d you on) for a detailed description of why this change in studio.py caused me re-downloads: https://github.com/FP-Studio/framepack-studio/pull/295/files#diff-05934289eba73cfacb716666819e900c0a2212ad0c7952f5cbb014617b3b739bR26

@arledesma
Copy link
Member Author

@RT-Borg

@arledesma see my long message on discord #testers last night (that I @'d you on) for a detailed description of why this change in studio.py caused me re-downloads: https://github.com/FP-Studio/framepack-studio/pull/295/files#diff-05934289eba73cfacb716666819e900c0a2212ad0c7952f5cbb014617b3b739bR26

Ahh, I see. It's almost odd that users with an explicitly set HF_HOME were not demanding that to be set and patching studio.py with each change. I had been manually adding it each time that I pulled and it just got lost in the original PR (as I didn't have the time to finish the work and pushed everything in my local).

Maybe, in the future, we can just add a cli flag to just skip setting the environment variable and let the sdk define how it behaves.

arledesma added 15 commits July 18, 2025 18:17
Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi (that do not seem to be correctly attributed to kohya-ss in thei primary eichi repo)

https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277

https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57

https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470
We do not manage multiple VideoJobQueue's, so this singleton can be imported and used anywhere that we need access to the queue
Enable switching between known lora loaders

Includes StrEnum implementation for python 3.10 (or older 3.x) users, otherwise use the builtin StrEnum from python >= 3.11
We do not manage multiple Settings objects, so this singleton can be imported and used anywhere that we need access to the Settings
These are defaulted to continue existing behavior. diffusers lora loader and no reuse of model instance
…ransformer3DModel

This was existing behavior that was found to not be mapped.
This was leading to the model being downloaded again for users that did have HF_HOME set to a value.

We will need to document the migration path for existing users to avoid redownloading the entire models again.
@RT-Borg
Copy link
Member

RT-Borg commented Jul 19, 2025

Ahh, I see. It's almost odd that users with an explicitly set HF_HOME were not demanding that to be set and patching studio.py with each change.

Pinokio users (like me) have it set for them (via the Pinokio ENVIRONMENT) without even realizing it. It's possible some others have set it for other AI projects and wouldn't even realize we use it. I know some users who have multiple installs have mentioned using junction links to avoid multiple model copies on the discord.

But I think it just hasn't been on that many people's radars.

Maybe, in the future, we can just add a cli flag to just skip setting the environment variable and let the sdk define how it behaves.

That sounds reasonable, but it's a little outside my lane to actually weigh in unless I learn more about the options and conventions. Whatever we do, we just need to take care that legacy users don't download another 80GB (and maybe not even know if or where there are duplicates to delete).

When loaded from a queue import the selected_loras and lora_values are equal in length.
When loaded from the Generate interface the lora_values are the same length as the lora_loaded_names and must be reduced.

This change now makes the assumption that when the two lists are the same length that they are in the correct order.
diffusers uses this environment variable to automatically downloads files on import. weird side effect to do that amount of actual work on import.
@arledesma arledesma force-pushed the feature/kohya-ss-lora-support branch from b044cb2 to a8cba73 Compare July 19, 2025 02:23
@Xipomus
Copy link
Member

Xipomus commented Jul 19, 2025

So to sum op testing for now:

  • import queue fixed
    +no more crashes on 2nd render
  • not seeing the model load all the time
    +better communication when the job starts, so you know what is happening

In settings needs Experimental settings, Lora loader on lora_ready (might want to make this default)
Else framepack lora's will have issues loading and some hunyuansuan-video-lora.

  • selecting a lora after one video rendered sets weight of lora to 0
  • and sometimes gradio doens't jump to the next render.. but could be a gradio thing...

@arledesma
Copy link
Member Author

Evaluating if we should default to the new loader.

Pros:

  • no user interaction required
  • silently support additional lora formats by default
  • automatically provide better support for mixing loras in some cases

Cons:

  • Importing or performing the same generation, with the same values, may not produce the repeatable outcomes from the initial generation
  • Unknown bugs may affect a larger user base
  • Changing default behavior(s) may negatively impact expected user experiences
    • when mixing loras the single and dual blocks may conflict leading to extreme distortions (this may not be noticed in diffusers as the conflicting weights may not even be loaded)

I'm sure that there are additional pros and cons.

@colinurbs colinurbs merged commit 16ddd3d into FP-Studio:develop Jul 22, 2025
@arledesma arledesma deleted the feature/kohya-ss-lora-support branch July 22, 2025 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants